Variant Discovery ◾ 159
The first column includes the line number in the original input file. The second column
shows the functional consequences of the variant. The possible consequences are nonsyn-
onymous SNV, synonymous SNV, frameshift insertion, frameshift deletion, nonframeshift
insertion, nonframeshift deletion, frameshift block substitution, or nonframeshift block
substitution. The third column includes the gene name, the transcript identifier, and the
sequence change in the corresponding transcript.
The “annotate_variation.pl” tool has numerous arguments. Use “annotate_variation.pl
-h” to display the complete list of arguments.
ANNOVAR provides “table_annovar.pl” script as an easy way to annotate variants in
a VCF file as an input. No need to convert VCF file into ANNOVAR input file. It takes a
VCF file as an input and generates a tab-delimited output file with many columns, each
represents one set of annotations. It also generates a new output VCF file with the INFO
field filled with annotation information.
./table_annovar.pl ../input/humanSNP.vcf humandb/ \
-buildver hg19 \
-out ../output/humanSNP2 \
-remove \
-protocol refGene,cytoBand,exac03,avsnp147,dbnsfp30a \
-operation g,r,f,f,f \
-nastring . \
-vcfinpu
The “-remove” option removes all temporary files. The “-protocol” option is comma-delim-
ited string that specifies an annotation protocol. These strings typically represent data-
base names in ANNOVAR. The “-operation” option tells ANNOVAR which operations
to use for each of the protocols, where “g” means gene-based, “gx” means gene-based
with cross-reference annotation (from -xref argument), “r” means region-based, and
“f” means filter-based. The above ANNOVAR command generated three output files:
“humanSNP2.avinput”, “humanSNP2.hg19_multianno.txt”, and “humanSNP2.hg19_
multianno.vcf”. The first one is an ANNOVAR input file. The second one is the annota-
tion file with annotation columns, and the last one is a VCF file with annotation added
to INFO fields. Open each of these files and study their contents.
We can also try the annotation database that we created for SARS-CoV-2. We can anno-
tate “sarscov2.vcf”, which was generated from a previous variant calling example. You can
copy it to the “input” directory for easy use. Thus, we can annotate it using the following
script:
./table_annovar.pl ../input/sarscov2.vcf sarscov2db/ \
-buildver SARSCOV2 \
-out ../output/cov2SNP \
-remove \
-protocol refGene \